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Abstract 

Time  series  modeling  as  the  sum  of  an  autoregressive  (AR)  process  and  sinusoids  is 
proposed.  When  the  AR  model  order  is  infinite,  we  call  this  the  Canonical  Autorcgres 
sive  Decomposition  (CARD)  and  is  equivalent  to  the  Wold  decomposition.  Maximum 
likelihood  estimation  of  the  sinusoidal  and  AR  parameters  is  shown  to  require  min¬ 
imization  with  respect  to  only  the  unknown  frequencies.  Although  the  estimation 
problem  is  nonlinear  in  the  sinusoidal  amplitudes  and  AR  parameters,  we  reduce  it  to 
a  linear  least  squares  problem  by  using  a  nonlinear  parameter  transformation.  A  gen¬ 
eral  class  of  signals  for  which  such  parameter  transformations  are  applicable,  thereby 
reducing  estimator  complexity  drastically,  is  derived.  This  class  includes  sinusoids  e« 
well  as  polynomials  and  polynomial-times-exponential  signals.  The  ideas  are  based 
on  the  theory  of  invariant  siibspaces  for  linear  operators.  CARD  serves  as  a  powerful 
modeling  tool  in  signal  plus  noise  settings  and  therefore  finds  application  in  a  large 
variety  of  statistical  signal  processing  problems.  We  briefly  discuss  some  applications 
such  as  spectral  analysis,  broad  band /transient  detection  using  line  array  data,  and 
fundamental  frequency  estimation  for  periodic  signals. 
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1  Introduction 


Many  problems  encountered  in  statistical  signal  processing  may  be  posed  as  one  that  at¬ 
tempts  to  decompose  a  time  series  into  its  signal  and  noise  components.  .A  common  example 
is  spectral  analysis  in  which  we  interested  in  extracting  sinusoidal  components  from  a  back¬ 
ground  of  noise.  More  generally,  this  is  the  problem  of  estimation  of  a  mixed  spectrum, 
one  that  is  composed  of  continuous  as  well  as  discrete  components.  Many  approaches  to 
this  problem  have  been  tried  but  the  results  have  been  generally  unsatisfactory.  The  time 
series  models  appear  to  be  either  sinusoids  in  white  noise,  used  in  eigenanalysis  approaches, 
or  colored  noise  only,  used  in  autoregressive  /  autoregressive  moving  average  modeling  for 
spectral  estimation. 

The  problem  of  statistical  inference  for  a  mixed  spectrum  is  not  new.  Early  attempts  by 
Whittle  can  be  found  in  [23]  and  a  more  general  description  in  [16].  The  methods  assume 
that  the  sinusoids  are  well  resolved  and  that  the  sinusoidal  parameter  estimates  are  not 
influenced  by  the  coloration  of  the  background  noise.  This  facilitates  the  implementation 
of  two  step  procedures  in  which  the  sinusoids  are  first  estimated  and  removed,  followed  by 
spectral  estimation  of  the  background  noise.  Such  an  approach  is  justified  by  appealing 
to  asymptotic  (as  the  data  record  length  increeises)  arguments.  While  the  ^lssumption  of 
sinusoidal  resolution  is  easily  understood  in  asymptotic  arguments,  the  second  assumption, 
i.e.,  that  the  possible  coloration  of  the  noise  can  be  neglected  (or,  the  noise  can  be  assumed  to 
be  white)  in  estimating  the  sinusoidal  parameters,  stems  from  a  result  in  [9].  The  reasoning 
is  that  the  asymptotic  performance  of  sinusoidal  parameter  estimators  based  on  colored 
or  white  noise  assumptions  are  identical.  However,  in  most  practical  problems  of  interest 
herein,  these  asymptotic  assumptions  are  invalid  due  to  the  availability  of  only  very  short 
data  records.  The  more  important  problem  from  a  practical  viewpoint,  is  to  be  able  to  jointly 
estimate  the  sinusoidal  components  as  well  as  the  background  noise.  We  propose  to  do  this 
by  employing  what  we  term  the  canonical  autoregressive  decomposition  (CARD)  model. 

It  should  also  be  noted  that  the  same  type  of  problem  arises  in  regression  analysis  [20]. 
There,  if  the  errors  are  not  white,  a  weighted  least  squares  estimator  is  preferred.  This, 
however,  assumes  knowledge  of  the  error  covariance  structure.  When  the  latter  is  unknown, 
the  usual  practice  is  to  iteratively  estimate  the  error  covariance  and  the  regression  coefficients, 
or  to  ignore  the  error  correlation  entirely  and  implement  an  ordinary  least  squares  estimator 
[-3, 15,  20).  Neither  approach  is  optimal  (for  the  given  problem)  in  any  sense.  Again  we  would 
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like  to  be  able  to  jointly  estimate  the  regression  parameters  {which  we  may  think  of  as  the 
signal)  and  the  error  (noise)  covariance  structure.  The  CARD  model  can  be  extended  to 
encompass  signals  (or  trends)  such  as  exponential,  polynomial,  sinusoidal,  or  combinations 
of  these.  These  results  can  be  used  to  solve  some  regression  problems  with  autocorrelated 
errors,  which  are  of  interest  in  the  social  and  behavioral  sciences  [20,  Chap.  6]. 

Finally,  the  key  to  the  usefulness  of  our  results  is  that  the  resulting  estimation  problem 
can  be  reduced  to  a  simple  linear  least  squares  problem  by  a  suitable  transformation  of  the 
parameter  space.  This  drastically  reduces  the  overall  complexity  of  the  estimation  problem. 

The  paper  begins  with  an  introduction  to  canonical  representations  for  wide-sense  sta¬ 
tionary  (WSS)  processes  based  on  the  Wold  decomposition.  For  most  practical  purposes, 
any  WSS  random  process  can  be  represented  as  an  infinite-order  autoregressive  process  plus 
a  sum  of  sinusoids.  We  term  this  the  canonical  autoregressive  decomposition.  Maximum 
likelihood  estimation  of  the  unknown  parameters  in  the  CARD  model  is  the  subject  of  the 
next  section.  While  this  seems  to  be  a  highly  nonlinear  estimation  problem,  we  reduce  its 
complexity  by  applying  a  nonlinear  parameter  transformation.  Related  results  may  also  be 
found  in  [1,  2,  14,  21,  22],  The  problem  is  finally  reduced  to  maximization  with  respect 
to  only  the  unknown  sinusoidal  frequencies.  The  following  section  contains  generalizations 
of  these  ideas  to  handle  deterministic  signals  (other  than  sinusoids)  in  autoregressive  noise. 
These  signals  include  complex  exponentials,  polynomials,  polynomials-times-exponentials. 
We  prove  a  theorem  which  yields  all  possible  signal  types  for  which  our  parameter  trans¬ 
formations  simplify  maximum  likelihood  estimation.  This  generalizes  the  known  results  for 
non-zero  mean  [1,  p.  200],  pure  sinusoids  [2,  14,  22],  complex  exponentials  [21],  polynomi¬ 
als  [4],  In  the  fifth  section,  we  briefly  touch  upon  some  signal  processing  applications  such 
as  broadband/transient  detection  using  line  array  data,  fundamental  frequency  estimation, 
spectral  analysis,  etc.  Numerical  examples  to  illustrate  some  applications  are  also  presented. 

2  Canonical  Autoregressive  Decomposition 

The  canonical  autoregressive  decomposition  (CARD)  model  is  based  on  the  Wold  decompo¬ 
sition,  the  Lebesgue  decomposition,  and  the  Kolmogorov  linear  prediction  theory.  We  give 
only  the  essential  theorems  to  motivate  our  use  towards  the  CARD  model.  The  interested 
reader  may  consult  [5,  13,  16,  19]  fo?  further  details. 

By  the  Wold  decomposition  theorem,  any  wide-sense  stationary  random  process  can  be 
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expressed  as. 


,r[r?]  =  .':[?}]  +  ie[n 


where 

1.  s[n)  and  rw[n]  are  uncorrelated  processes 

2.  s[n]  is  a  singular  process  in  that  it  can  be  perfectly  predicted  by  a  linear  combination 
of  its  past  values. 

3.  r«[n]  is  a  regular  process  (cannot  be  perfectly  predicted  by  a  linear  combination  of  its 
past  samples)  having  a  moving-average  representation 

u![n]  =  ^  b[k]u[n  -  k]  (2) 

k=b 

oo 

with  I  being  a  white  noise  process  uncorrelated  with  s[n]. 

k=Q 

By  the  Lebesgue  decomposition  theorem,  the  spectral  distribution  function  (which  may 
be  thought  of  as  the  integrated  power  spectral  density)  of  any  wide-sense  stationary  random 
process  can  be  decomposed  as 

S{f)  =  S^{f)-kS2{f)  +  S,if)  ,  (3) 

where  5i(/)  is  an  absolutely  continuous  distribution  function,  S2{f)  is  a  step  function  with 
steps  Pi  at  frequencies  /,,  and  53(/)  is  a  singular  function.  For  all  practical  purposes,  the 
third  component  S^if)  can  be  ignored  (13,  16].  By  the  Wiener-Khintchine  theorem,  the 
autocorrelation  function  corresponding  to  S2{f)  is 

r,H  =  /* 

•'"2 

1=1 

or  S2[n]  is  a  sum  of  sinusoids  and  is  perfectly  predictable.  Finally,  the  absolutely  continuous 
component  could  represent  either  a  regular,  or  singular  process,  depending  on  whether  [16], 
(23,  App.  2] 

l^lnS',UW>-oc  (4) 
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holds  or  not,  respectively.  Here  /  denotes  dilFerentiation.  so  that  S[{f)  denotes  the  PSD 
corresponding  to  the  absolutely  continous  component  in  (3).  For  the  most  part,  the  singular 
process  therein  corresponds  to  perfect  prediction  based  on  strictly-infinite  past.  Examples 
of  such  processes  include  perfectly  band-limited  processes.  By  assuming  that  the  absolutely 
continuous  component  represents  a  regular  process,  i.e.,  (4)  is  satisfied  (which  identifies  Si(f) 
in  (3)  with  u;[n]  in  (1)),  we  rewrite  (1)  as 

a;[nl  =  ^2  ~  ^1  XI  (-5) 

k=0  i=l 

where  i4,’s  are  zero  mean,  complex-valued,  random  variables  uncorrelated  with  each  other 
and  with  the  u[n]’s,  and  €{\  A,  p)  =  Pi. 

The  CARD  model  is  finally  obtained  by  noting  that,  under  some  conditions,  an  infinite- 
order  moving  average  process  is  equivalent  to  an  infinite-order  autoregressive  process.  That 
this  is  not  true  in  general  is  illustrated  by  [13]  the  moving  average  (MA)  process 

r«[n]  =  t<[n]  —  u[n  —  1]  , 

which  does  not  have  a  autoregressive  (AR)  representation.  The  reason  stems  from  the  fact 
that  the  MA  process  has  a  zero  on  the  unit  circle,  so  that  the  transfer  function 

1  1 
W)~T^  ’ 

is  not  analytic  for  2  >  1.  To  avoid  this  problem,  we  assume  that  the  power  spectral  density 
(PSD)  of  the  MA  component  in  (5)  is  bounded  away  from  zero,  or  that  Px{f)  >  C  >  0  for 
all  /.  Then,  (4)  is  satisfied.  This  assumption  is  not  overly  restrictive  in  that  all  physical 
processes  are  subject  to  observation  noise,  causing  the  PSD  to  be  strictly  positive.  Note 
that  the  general  conditions  for  existence  of  infinite-order  AR  representations  are  extremely 
complicated  [13]. 

In  summary,  we  have  the  following  results 

Theorem  1  If  x[n]  is  any  wide-sense  stationary  process  with  PSD  Px{f)  >  C  >  0  / 

so  that  equation  (4)  is  satisfied,  then  it  can  be  decomposed  as 

x[n]  =  u;[n)  -h  ^  ,  (6) 

1=1 
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(7) 


where  ti;[n]  is  an  infiniie-order  AR  process  given  by 

OC, 

u![n]  =  —  ^  a[^’]iy[n  —  Xr|  +  u[n|  . 

*r=l 

The  u[n]  process  is  white  noise  with  variance  a' ,  and  uncorrelated  with  the  .4,  ’s.  The  Ai ’s  are 
zero  mean,  complex-valued,  uncorrelated  random  variables  with  variances,  S{\  Ai  1^)  =  P,. 


As  a  result  of  this  decomposition,  termed  the  canonical  autoregressive  decomposition,  we 
have  the  PSD  ^ 


where  a[k\e~^‘^^^'’  and  a{0]  =  1.  The  word  “canonical”  stems  from  the  fact 

k=Q 

that  such  a  decomposition  can  represent  the  second-order  moment  properties  of  any  physical 
process  encountered  in  practice.  Lastly,  in  most  applications,  we  will  assume  that  the  order 
of  the  AR  process  is  finite  so  that  in  (8)  will  be  =  1-1- 

Hence,  the  unknown  parameters  are  {p,  a[l],  a(2],  . . . ,  a(p],  s,  Pi,  fi,  . . . ,  P,,  /,}. 


3  Parameter  Estimation 

We  will  now  derive  an  approximate  maximum  likelihood  estimator  (MLE)  of  the  CARD 
model  parameters.  It  is  assumed  that  the  AR  process  is  Gaussian  but  that  the  sinusoidal 
amplitudes  are  deterministic.  Instead  of  attempting  to  estimate  the  powers  Pi  in  the  CARD 
model,  we  modify  the  problem  by  assuming  that  the  A,’s  are  unknown,  deterministic  con¬ 
stants.  The  reason  for  this  is  that  the  MLE  for  the  original  problem  is  highly  intractable.  Fur¬ 
thermore,  since  only  a  single  realization  of  the  time  series  is  available,  it  makes  no  sense  to  es¬ 
timate  the  variances  Pi,  which  are  ensemble  averages  (the  MLE  will  be  inconsistent).  Finally, 
we  will  also  assume  that  the  number  of  sinusoids  (s)  and  the  AR  model  order  (p)  are  known. 
Therefore,  the  unknown  parameters  are  {afl],  a(2],  . . . ,  a[p],  cr^,  Ai,  /i,  . . . ,  A,,  /,}. 

To  summarize  the  estimation  problem,  we  assume  the  time  series  model  to  be 

x[n]  =  s[n)  -f  w[n]  ,  (9) 

where 

=  ,  (10) 

ia:l 
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U'[t}]  =  -  ^  (i[lc]  tr[n  -  k-]  +  »[n]  .  (11) 

k=i 

Here  u[n]  is  complex,  white  Gaussian  with  variance  cr^  so  that  ie[n],  and  hence  x{n],  is  com¬ 
plex  Gaussian.  The  unknown  parameters  are  Q  =  cr^j  ,  where  A  =  [Aj  A2  ■  ■  ■  A,]^, 

f  =  [/i /2  • --/jr,  and  a  =  a[l] a[2] . . . a[p]  . 

. 

r 

Assuming  that  a  finite-length  data  record  =  i[0]a-[l] . . .  ar[A  —  1]  is  available,  the 
approximate  (actually  conditional)  PDF  (8,  10]  is  given  by 

I  f  j  N-l  p  j21 

P(Xo;0)  =  - - ^exp^-—  5^  5^a[/:](i[n. -A:]-s[n-;'])  [  ,  (12) 


(TTcr^y 


^  n=p  ljt=0 


where  N'  =  N  —  p  and  a[0]  =  1.  Maximizing  the  PDF  with  respect  to  cr^  leads  to 


so  that  to  find  the  MLE  of  6'  {6  without  cr^),  we  must  now  minimize  J{-).  Using  (10),  we 
rewrite  J  as 


n=p  U=0 


(14a) 


n=p  k=0  t=l  fc=0 

n=P  fc=0  «=1  \fc=0  / 

=  E  > 

n=p  k=0  t=l 


(14b) 


where  /«,■  =  —  A,-  ^  a[k]e~^^*^'^.  Clearly  J  in  (14b)  is  quadratic  with  respect  to  the  a[/b]’s 

k=0 

and  Pi's.  Thus  a  nonlinear  (due  to  interaction  between  afi]  and  A,-  in  (14a))  least  squares 
problem  is  reduced  to  a  linear  least  squares  problem  by  the  nonlinear  parameter  transfor¬ 
mation, 

=  ■  (15a) 
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The  equivalence  of  the  problems  in  (lln)  and  (14/))  via  the  para.reter  transformation  is 
illustrated  in  Figure  1. 


(a)  Approximate  MLE  for  Original  Parameters 


Minimize  over 

-►  a 

to  find  MLE 


^{n]  =  EUx 


Hi 


(b)  Approximate  MLE  for  Transformed  Parameters 
Figure  1:  Approximate  MLE  and  Sinusoidal  Amplitude  Transformation 
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Note  that  — /i,  can  be  thought  of  as  the  amplitude  of  the  prevvhitenene<i  sinusoidal  signal, 
the  prewhitener  being  implementable  if  the  AR  parameters  were  known.  The  transformation 
from  A,  to  /r,  is  one-to-one  iff  the  .AR  process  does  not  have  poles  on  the  unit  circle  (which 
of  course  it  does  not),  in  which  case  we  obtain 

/  /T  ^ 

Ai  =  -  n. 


Now,  define 


X  =  x[p]  x[p  -f  1]  '  •  •  x[iV  —  1] 

I' 

x[p-l]  x[p-2]  ••• 

x[0] 

H  = 

"  S 

..  1 

x[l] 

x[iV-2]  x[iV-3)  ••• 

x[A^  —  1  —  p] 

gj2ir/lP  gj2jr/2P 

gj  2  nj.p 

gj2)r/i(p-H)  gj2ir/j(p-H) 

...  gj2jr/,(p-(-l) 

E  = 

;  ; 

; 

gi2;r/,(jV-l)  ^32irh{N-l) 

...  gj2»A(^-l) 

(;V  -  p)  p 


{N  —  p)  X  s 


(1.5b) 


(16) 


so  that  (14b)  becomes 

J'(a,/.,f)  =  ||x  -f-  Ha  +  E^||2  ,  (17) 

where  p  =[piP2  >  and  |l  •  1|  denotes  the  Euclidean  norm  of  a  complex  vector.  It  can 

be  shown  that  the  concatenated  matrix  [H  E]  is  full  rank  with  probability  one  if  the  data 
conforms  to  the  assumed  model  and  N  >  2p  +  s,  hence  a.  unique  minimum  exists  for 
Minimizing  J'  with  respect  to  a  and  p  leads  to  (see  Appendix  A  for  details) 

J"(f)  =  x"(P^-P^H(H"PiH)‘'H'^P^]x  (18a) 

=  x"[Pi^-P^E(E"P^E)"'E"P;^]x  (18b) 

where  P^  =  I  -  E  (e"E)~'  E^  and  P^  =  I  -  H  (h"H)'‘  H"  are  projection  matrices. 
All  the  required  inverses  will  exist  since  (H  E]  is  full  rank  with  probability  one.  The  corre¬ 
sponding  a  and  p  are  given  by  either 

a  =  -  (H«P^H)"'H"Pjx 

p  =  -  (E"E)'*  E''(x  +  IIa)  , 


9 


(i9a) 

(19b) 


or 


(20a) 

(20b) 


t*  - 


(E^P^E)  E^Pf,x 

(h'^h)''h^'(x  +  E4; 


The  estimation  of  the  unknown  frequencies  is  accomplished  by  minimizing  J"(  )  in  (18)  and 
will  in  general  require  iterative,  nonlinear  optimization  techniques.  Using  these  estimated 
frequencies  in  (19)  or  (20),  and  (15b),  (1.3),  the  other  parameters  can  be  obtained.  Equations 
(18)  -  (20)  have  some  interesting  !r":;rpretations.  The  estimate  a  in  (19a)  can  be  thought 
of  as  the  usual  covariance  method  after  subtractinq  the  signal  components  (via  Pg)  in  the 
data  vector  x  and  the  data  matrix  H.  The  resulting  minimum  value  for  the  prediction  error 
is  given  in  (18a),  which  must  be  minimized  to  obtain  the  unknown  frequencies.  Finally,  /I 
in  (196)  is  the  signal  amplitude  estimate  based  on  the  ‘prediction  error’  x  +  Ha.  Similar 
interpretations  are  possible  for  the  other  expressions  wherein  the  Pfj  operator  is  interpreted 
as  an  AR  prewhitener,  but  one  that  is  based  on  signal  +  noise. 

We  now  give  an  example.  Assume  that  we  wish  to  estimate  the  frequency  of  a  complex 
sinusoid  in  AR  noise.  Then,  using  s  =  I  in  (18b),  we  get  (since  E^P^E  is  a  scalar) 


r{f)  =  x''pfex  - 


where  e=  eJ2>^Ap+i)  ...  ^  P^  =  I  -  H  (h"h)"' and  x,  H  are  given 

in  (16).  Hence  an  approximate  MLE  of  the  frequency  must  maximize 


C(/)  = 


e^PjjX  1^ 
e^P^e 


This  has  some  interesting  interpretations.  Let  Uq  =  Pj^x  =  x  +  H^,  where  ao  is  the  usual 
least  squares  estimator  (covariance  method)  of  the  AR  parameters,  and  u  =  x  +  Ha,  where 
a  is  the  estimate  of  the  AR  parameters  given  by  (206).  Clearly,  both  are  estimates  of  the 
driving  noise  sequence.  As  shown  in  Appendix  B,  we  can  rewrite  (21)  as, 

(e^Uo)  e^u 

C(/)  =  - -h- -  •  (22) 


Noting  that  for  low  SNR,  ^  %  a,  so  that  Uo  u,  we  obtain 

I’ 

=  -r-;;IW)P 

jV  —  n 


UoU) 
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where  Uoif)  =  denotes  the  Fourier  transform  of  the  driving  noise  sequence  estimate. 

Neglecting  t'nd  effects  (for  .V  >  p).  we  get  fo(/)  =  ^  )X{f),  where  = 

1  +  Ylk-i  ao[^'|e“-'^’^ and  .V(/)  =  denotes  the  Fourier  transform  of  i[nj. 

Hence  we  obtain. 


C(/)  ~ 

or  equivalently,  we  may  maximize 

Cif) 


I  XU)  /N 

aj/ I  A(e''")  P  ’ 


where  is  the  NILE  of  the  driving  noise  variance  assuming  no  signal  is  present.  But, 
the  numerator  is  just  the  periodogram  lAf)  the  denominator,  the  estimated  AR  PSD 
^arH)-  Hence  for  low  SNR,  the  approximate  MLE  maximizes 


Cif) 


hif) 

ParU)  ' 


We  may  view  the  procedure  as  either  prewhitening  the  data  followed  by  peak  picking  the 
periodogram,  or  normalizing  the  periodogram  by  the  background  noise  followed  by  peak 
picking. 


4  Extensions  to  Other  Signals 

The  key  to  analytical  solutions  in  the  previous  section  was  the  parameter  transformation 
{15a)  which  rendered  the  problem  linear  in  the  new  parameters.  A  careful  consideration  of 
the  transformation  reveals  that  this  was  possible  because  sinusoids  are  eigenfunctions  of  a 
linear  time-invariant  (LTI)  FIR  filter.  If  the  input  to  an  LTI  FIR  filter  is  a  causal  sinusoidal 
sequence,  then  the  output  (after  disregarding  the  initial  samples)  is  also  sinusoidal,  but  with 
a  different  amplitude.  More  generally,  if  the  input  sequence  belongs  to  a  subspace  spanned 
by  sinusoidal  sequences  (as  in  (10)  *,  the  output  sequence  also  lies  in  the  same  subspace. 
This  property  is  an  instance  of  the  ;  o-called  subspace  invariance  of  linear  transformations 
[6,  ”].  However,  subspaces  spanned  by  eigenfunctions  are  only  the  most  primitive  of  invariant 
subspaces;  more  generally  they  involve  generalized  eigenfunctions  and  Jordan  chains  [7]. 

We  now  state  our  results  which  extend  the  case  of  sinusoids  in  AR  noise  to  more  general 
signals  in  AR  noise.  Seme  of  these  signals  are 
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1.  dampod  and  undamped  sinusoids:  .s[n]  =  5Zi=i 

2.  polynomials:  s[nj  =  -d,  /i‘~' 


3.  poiynomials-timcs-exponcntiais:  s[;ij  =  r''  iY.i=\  -di  ?i'  '). 


or  their  linear  combinations. 

To  do  so.  we  present  some  definitions  follower!  by  our  theorems  (the  proofs  of  which  are 
in  Appendices  C  and  D).  The  definitions  and  theorems  are  designed  to  answer  the  following 
question; 

For  s[n]  given  by 

1=1 

where  s,[n]  are  known  and  the  amplitudes  are  unknown,  we  wish  to  replace  the  minimization 
of 

Ji^')  =  E  X]a[^](-r[n  -  A-]  -  s[n  -  fc]) 

n=p  fcsO 


/V-1 

=  Z 

rv=p 


Z  a[k]x[n  -k]  -  a[i-]  ^  A,s,[n  -  k] 

fe=0  Jk=0  1  =  1 


by  the  minimization  of 


N-l 


J^e')  =  E 


n=p 


^  a[fc)x[n  -  A:]  +  ^^,5,[n] 


fc=0 


i=i 


Hence,  we  need  to  have  that  for  all  A,  a. 


E  «(^i  -d.^.fn  -  *]  =  -  , 


kr=0  1=1 


i=i 


(23) 


for  n  =  p,  p  +  1,  . . . ,  (Af  —  1).  To  be  useful,  we  require  the  transformation  in  (23)  to 
hold  for  any  N  and  any  p  for  which  the  maximum  likelihood  problem  is  well  defined.  In 
effect,  we  require  that  any  linear  combination  of  the  basis  signals  {^ifn],  52[n],  ...,  s,[n]} 
when  transformed  by  the  linear  time-invariant  filter  A{z)  =  1-1-  must  produce 

a  signal  that  is  again  a  linear  cor.ibination  of  the  basis  signals,  {si[n],  5;[n],  ...,  s,(n|}. 
Symbolically,  letting  L  denote  the  linear  filter  operation  by  >1(2),  we  require  that 

L  (53  for  p  <  n  <  /V  -  1  .  (24) 

I 1=1  J  i=i 
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We  now  find  all  possible  basis  signal  sets  for  which  (24)  holds.  Note  that  it  mu.st  ’  ^Id  for 
all  L,  or  equivalently  for  ail  a.  The  B,'s  will  of  course  depend  on  the  values  of  a  anu  A. 

Definition  1  denotes  the  linear  vector  space  of  all  lengfh-N  sequences.  .4n  element  in 
the  space  will  be  denoted  by  2[0],  2(1],  . . . ,  cfiV  —  1). 

Definition  2  Let  C  denote  the  set  of  all  p-th  order  linear  transformations  on  Cn.  An 
element  in  C,  described  by  the  parameters  a(l],  a[2|,  a[p],  with  a\p\  ^  0,  operates  on  an 

element  2(0],  2(1],  . . . ,  2[A^  —  1]  in  to  produce  an  element  yip],  y\p  +  1],  . . . ,  yfA  —  1]  in 
Cfi.p  according  to 

p 

j/[n]  =  2[n]  +  ^  a[i]2[n  —  k\  ,for  p  <  n  <  N  —  I  (25) 

i=i 

Definition  3  A  p-shift  operator  V{-)  operates  on  an  element  2[0|,  2[1],  . . . ,  2[A^  —  1]  in 
to  produce  an  element  z\p\,  2[p+  1],  •  •  • ,  —  1]  in  Cs-j>-  V  operates  similarly  on  subspaces 

in  C/v  • 


Definition  4  A  subspace  $  in  Cn  is  said  to  be  invariant  with  respect  to  C  if  L  :  5  — ♦  V{S) 
for  any  L  ^  C. 


With  these  definitions,  we  have  our  main  results.  Some  related  results  for  p  =  1  are 
alluded  to  in  (7,  p.  200j. 


Theorem  2  The  invariant  subspace  S  of  dimension  M  <  N  —  2p  A-  \  is  composed  of  r 
invariant  subspaces  Si,  Sr  where  S  =  Si®S2®---®Sr,  and  0  denotes  the  direct 

sum  of  the  subspaces.  The  basis  for  Si,  which  is  of  dimension  mi,  is 


K 


(26) 


for  n  =0,  1,  . . . ,  (iV  —  1),  and  where  X-  is  any  nonzero  complex  number  and  nij  =  A/. 


Proof  .  See  Appendix  C.  o 

Note  that  for  the  problem  of  (17)  to  be  well  defined,  we  must  have  s  <  N  —  2p.  This  is 
more  restrictive  than  Theorem  2  (where  M  <  N  —  2p  -r  1),  so  that  the  parameter  transfor¬ 
mation  itself  imposes  no  new  problem  constraint. 
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Theorem  3  Let  5[n]  belong  to  the  invariant  subspace  S.  From  Theorem  2.  the  signal  is  of 
the  form 


r  m,  —  1 

1=1  j=0 


(■) 


n 


(27: 


where  are  arbitrary  constants.  Then,  for  all  p~th  order  AR  processes  with  poles  not 
located  at  Xi,  A2,  . . . ,  A^,  we  have  that,  for  p  <  n  <  jV  —  1, 


p  r  mj  — 1 

£  =  Z  L 

k=0  i-l  j=0 


n 

J  . 


A" 


n-] 


The  Dj'^  are  found  using,  D,  =  T,B«  .  for  i  =  I,  2,  . . . ,  r,  where 

D,  =  cl;!.,]’'  . 


J,= 


T,  =  E  . 

Jt=0 

A.  1  0  •••  0 

0  A,  1  ...  0 

0  0  0  •••  A, 

B,  =  [ei''  Bj'' .  ■ .  fll;!_, 


m,  X  m, 


(28) 


(29) 


Proof:  See  Appendix  D.  o 

Note  that  the  transformation  from  B,  to  is  invertible  since  T,  is  nonsingular.  This  is 
because  J~*  is  upper-triangular  with  A~*'  along  the  main  diagonal,  so  that  T;  is  also  upper- 
triangular  with  along  its  main  diagonal.  Since  the  AR  process  does  not  have 

poles  at  the  A/s,  T,  will  be  nonsingular.  This  means  that  the  le^lst  squares  minimization 
can  be  done  with  respect  to  D,’s. 

We  now  present  some  examples  and  then  discuss  the  restrictions  on  the  modes,  A,’s. 
Examples: 


1.  Sinusoids  and  damped  sinusoids: 

Let  mj  =  1  and  A^  =  in  (27).  Then,  we  get 

s(n]  = 

1=1 
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2.  Polynomials: 

Let  r  =  1,  mi  =  A/,  and  Ai  =  1  in  (27).  Then,  we  get 


J=0  \J/ 


n 


n 


But,  ^  .  j  =  n\/j\{n  —  j)\  =  n(n  —  1)  ■  ■  ■  (n  —  j  +  1)/;!,  so  that  ^  .  j  is  seen  to  be  a 
polynomial  of  degree  j.  It  can  be  shown  that 

M-l 

5[n]  =  ^  By  . 

j=o 

This  case  was  originally  discovered  by  Djuric  [4]. 


3.  Polynomials-times-Exponentials: 

Let  r  =  1,  mi  =  Af  and  Ai  =  in  (27).  Then,  we  get 

»N= 

or 

Af-l 

s(n]  =  ^  By  . 

j=o 

Also,  any  linear  combination  of  these  signals  is  possible. 

Once  the  signals  have  been  chosen,  the  general  form  of  the  output  signal  is  known  except 
for  the  amplitudes,  which  are  to  be  estimated.  To  recover  the  input  signal  amplitudes,  we 
use  from  Theorem  3, 

B.  =  T-‘D.  . 

Since  T,  is  upper-triangular  with  along  the  diagonal,  T,  will  be  non-singular 

iff 

>l(\)  =  1  +  E  -ItlV*  *  0  ,  (30) 

fc=l 

Therefore,  for  the  transformation  in  (29)  to  be  invertible,  the  zeros  of  the  filter  A{z),  i.e., 
the  poles  of  the  AR  process,  should  not  lie  at  the  signal  modes.  Otherwise,  A{z)  annihilates 
the  corresponding  signal  mode  and  the  original  amplitude  is  not  recoverable.  In  section  3, 
we  had  s[n]  =  so  that  (30)  meant  that  ^(e-’^"^')  ^  0,  or  the  poles  of  the 

noise  process  should  not  coincide  with  the  sinusoidal  frequency  locations. 
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5  Some  Applications 


The  CARD  model  is  useful  in  that  it  reduces  a  nonlinear  minimization  problem  involving  the 
AR  parameters,  signal  amplitudes,  and  signal  modes,  to  one  involving  only  the  signal  modes. 
Additionally,  because  of  its  form,  the  signal  and  noise  are  independently  parameterized,  the 
Fisher  information  matrix  is  block  diagonal  [24].  This  has  important  implications  because  it 
says  that  the  Cramer-Rao  (CR)  bound  for  the  signal  parameters  is  the  same  for  the  cases  of 
knoNvn  AR  parameters  and  unknown  AR  parameters.  For  instance,  in  the  spectral  analysis 
problem  of  section  3,  the  CR  bound  for  the  sinusoidal  parameters  can  be  found  in  [18], 
although  it  was  not  as  thoroughly  investigated  as  the  white  noise  case.  When  the  CR  bound 
is  attained,  the  estimation  performance  is  as  good  as  if  the  AR  parameters  are  known.  In 
detection  applications,  this  translates  into  optimal  detection  in  unknown  AR  noise  [11]. 

Although  the  number  of  applications  of  the  CARD  model  and  its  extensions  is  quite 
large,  we  now  briefly  describe  some  of  current  interest  to  the  signal  processing  community. 
They  are 

1.  spectral  analysis 

2.  broadband  /  transient  detection 

3.  fundamental  frequency  estimation 

A.  Spectral  Analysis:  Spectral  analysis  of  time  series  using  the  CARD  model  can  be  imple¬ 
mented  if  the  number  of  sinusoids  and  the  order  of  the  AR  process  are  known.  Otherwise, 
some  means  of  model  order  selection  must  be  employed.  We  are  currently  investigating 
this  problem.  Assuming  known  model  orders,  the  sinusoidal  frequencies  can  be  estimated 
by  minimizing  J"(-)  in  (18).  Then  the  AR  parameter  and  sinusoidal  frequency  estimates 
are  found  from  (19)  or  (20).  The  minimization  of  J"  will  be  difficult  for  more  than  a  few 
sinusoids,  so  that  we  are  currently  investigating  various  approaches. 

We  now  present  some  numerical  examples  using  N  —  25  data  samples  consisting  of  two 
sinusoids  in  colored  2"'*  order  AR  process.  Three  generic  spectral  estimators  are  considered, 
one  is  a  purely  continuous  PSD  estimator,  the  second  a  purely  discrete  (line)  spectral  esti¬ 
mator,  and  the  third  is  a  mixed  spectral  estimator  given  by  the  CARD  model.  The  purpose 
is  to  provide  some  brief  numerical  comparisons  on  how  the  CARD  model  estimates  compare 
with  usual  methods  when  (i)  an  estimate  of  the  entire  spectrum  is  of  interest,  and  (n)  only 
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the  frequency  estimates  are  of  interest.  The  purely  continuous  PSD  estimator  chosen  here 
is  a  12th  order  AR  spectral  estimate  computed  using  the  modified  covariance  method  [10]. 
This  is  a  common  choice  in  such  mixed-spectrum  problems.  The  line  spectral  estimator  was 
obtained  by  minimizing  (186)  with  p  =  0  and  s  =  2,  and  is  a  implementation  of  the  MLE 
for  the  sinusoids  in  white  noise  problem  [10],  Strictly  speaking,  as  an  estimate  of  the  overall 
spectrum,  this  (in  itself)  will  provide  a  very  poor  estimate,  and  higher  orders  (i.e.,  s  >  2) 
will  be  needed  for  this  purpose.  In  applications  wherein  only  the  frequencies  are  of  interest, 
this  (along  with  AR  spectral  analysis)  is  the  most  common  implementation.  Finally,  the 
CARD  estimate  is  obtained  by  minimizing  (186)  with  p  —  2  and  s  =  2.  The  AR  param¬ 
eter  estimates  in  CARD  are  then  computed  using  the  estimated  frequencies  in  (19a).  The 
minimization  of  (186),  in  both  estimators,  was  carried  out  by  implementing  a  500  x  500  grid 
search  for  the  unknown  frequencies.  The  sinusoidal  components  are  indicated  by  a  single 
line  of  height  (the  true  values  of  Ai,  are  used,  not  their  estimates)  located  at 

the  estimated  frequency.  The  display  is  therefore  indicative  of  the  sinusoidal  power  relative 
to  the  noise  PSD.  Typically,  several  realizations  of  these  estimators  will  be  plotted  so  that 
small  variations  in  the  frequency  estimates  cause  the  resulting  lines  to  appear  adjacent  to 
each  other  and  thereby  forming  a  narrow  band. 

As  the  first  example,  a  narrowband  AR(2)  process  with  poles  at  0.95  and 

0.95  e'2  7ro.2  3^  driving  noise  variance  of  =  1.0  is  considered.  The  sinusoidal  fre¬ 
quencies  were  fi  =  0.60  and  /2  =  0.62,  and  the  amplitudes  were  Ai  =  =  l.O.  The  /ocoi 

signal  to  noise  ratio  (SNR),  given  by  N A] / P^{fi),  and  broadband  SNR,  given  by  A;/r,u[0], 
are  about  25  dB  and  —17.5  dB,  respectively.  Figure  (2a)  shows  the  CARD  spectral  estimate 
for  10  independent  realizations  of  the  data  set.  The  line  spectrum  for  the  same  data  set  is 
given  in  Figure  (26).  The  frequency  estimates  are  highly  biased  (as  expected,  the  model  is 
incorrect)  and  are  very  close  to  the  peaks  in  the  noise  spectrum.  This  is  attributed  to  the 
extremely  low  sinusoidal  power.  The  AR  spectral  estimate,  shown  in  Figure  (2c),  exhibits 
a  number  of  false  peaks  (as  expected,  see  [10]),  and  sinusoidal  resolution  is  poor.  Finally, 
the  average  CARD  spectrum  (computed  using  50  trials)  is  shown  in  Figure  (2d)  and  is  very 
close  to  the  true  PSD. 

To  illustrate  the  effect  of  sinusoidal  frequency  location  relative  to  the  noise  peaks,  we 
change  the  sinusoidal  frequencies  to  fi  =  0.30  and  =  0.32.  The  local  SNR  reduces  to 
about  11.5  dB,  which  is  too  low  for  all  three  algorithms.  The  local  SNR  can  also  be  thought 
of  as  iV  I  Pi  I*  where  pi  is  the  ‘‘prewhitened’’  signal  amplitude  (see  (15a)).  Hence  by 
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changing  the  frequencies  we  are  effectively  reducing  the  SNR  that  dictates  performance  of 
the  CARD  model.  The  amplitudes  were  increased  to  Aj  =  A2  =  10,  for  which  the  local  SNR 
was  about  31.5  dB  and  the  broadband  SNR  was  about  2.5  dB.  Figures  (3a)  -  (.3c)  show  the 
three  spectral  estimators  for  10  independent  realizations.  Since  the  frequency  estimates  in 
Figure  {3b)  are  a  little  unclear,  Figures  (.3d)  and  (3e)  show  the  frequency  estimates  using 
CARD  and  line  spectral  estimators  for  50  independent  realizations.  The  bias  in  the  line 
spectral  estimate  for  fi  is  towards  the  noise  spectral  peaks,  while  that  for  /2  is  towards  0.31. 
No  such  systematic  bi«is  is  noticable  in  the  CARD  estimates.  The  average  CARD  spectrum 
(computed  using  50  trials)  is  shown  in  Figure  (3/)  and  is  quite  close  to  the  true  PSD. 

It  is  clear  that  the  improvement  (vis  a  vis  estimator  bias)  in  the  CARD  frequency  ^timate 
over  the  usual  frequency  estimate  (i.e.,  minimizing  (186)  with  s  =  2  and  p  =  0)  is  not  as 
drastic  as  in  the  previous  example.  Loosely  speaking,  the  latter  estimator  searches  for  total 
spectral  power  in  narrow  bands  (sinusoidal  power,  noise  peak  power,  etc.)  Hence,  it  will 
yield  reasonable  (low  bias)  frequency  estimates  when  the  sinusoidal  power  is  much  larger 
or  comparable  to  the  noise  peaks  (as  in  Example  2),  and  will  yield  poor  (highly  biased) 
frequency  estimates  when  the  noise  peaks  are  much  larger  than  the  sinusoidal  powers  (as  in 
Example  1).  An  analysis  of  the  CR  bound  brings  out  the  dependence  of  the  CARD  estimator 
performance  (via  variance)  on  the  local  SNR  as  well  as  A]fP^{f),  where  f  denotes  derivative 
with  respect  to  /,  and  is  currently  in  progress. 

A  third  example  using  broadband  AR  noise  is  now  considered.  It  illustrates  the  effec¬ 
tiveness  of  the  CARD  model  for  narrowband  +  broadband  spectra,  which  is  difficult  to 
implement  using  other  modern  spectral  analysis  methods  [10].  The  AR(2)  poles  are  located 
at  0.70  and  0.70  and  the  driving  noise  variance  was  =  0.5.  The  two  sinu¬ 

soidal  frequencies  were  fi  =  0.60  and  fz  ~  0.62,  and  the  amplitudes  were  Ai  —  A2  —  1.0. 
The  local  and  broadband  SNRs  are  about  25  dB  and  5  dB,  respectively.  Figure  (4a)  shows 
the  CARD  spectral  estimate  for  10  independent  realizations  of  the  data.  The  line  spectral 
estimate  is  given  in  Figure  (46)  and  shows  one  sinusoid  near  0.61  and  the  other  sinusoid  in 
the  general  vicinity  of  the  noise  power  concentration  (about  0.05  to  0.25).  The  AR  spectral 
estimates  in  Figure  (4c)  still  show  a  large  number  of  false  peaks,  but  the  sinusoidal  resolution 
is  a  little  better  (as  compared  to  Example  1)  probably  because  the  broadband  SNR  is  a  little 
higher.  Finally,  the  averaged  CARD  estimate  (computed  using  50  trials)  is  shown  in  Figure 
(4d)  and  is  quite  close  to  the  true  PSD. 

B.  Broadband  Signal  Detection:  The  difficulty  of  optimization  with  respect  to  the  unknown 
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frequencies  can  be  substantially  reduced  if  the  frequencies  are  known  to  be  related  to  each 
other.  One  Ccise  is  broadband  detection  for  linear  arrays  in  which  a  broadband  signal  can 
be  modeled  as  a  sum  of  2-D  sinusoids.  Since  the  signal  is  known  to  arrive  from  a  certain 
direction,  we  know  that  the  frequencies  lie  along  a  “bearing  line.”  All  that  is  unknown  is 
the  bearing.  This  results  in  a  relatively  simple  1-D  optimization  over  the  possible  bearings. 
The  interested  reader  must  consult  [12]  for  further  details. 

C.  Fundamental  Frequency  Estimation:  Another  related  application  area  is  fundamental 
frequency  estimation  for  a  periodic  signal  in  colored  noise.  This  problem  arises  in  speech 
processing  as  pitch  estimation  in  colored  noise,  the  co-channel  problem  (for  sum  of  voiced  plus 
unvoiced  speech  segments)  [17],  in  biomedical  applications  as  electro-cardiogram  monitoring 
in  colored  noise,  etc.  We  develop  the  estimation  procedure  in  some  detail. 

Assume  that  we  observe  a  periodic  signal  in  Gaussian  AR  noise,  or 

arH  =  -h  tc[n]  ,  (31) 

«=i 

for  n  =  0,  1,  . . . ,  (iV  —  1).  The  fundamental  frequency  fo  is  to  be  estimated  along  with  the 
Fourier  series  coefficients  A,-  and  the  AR  parameters,  a[l],  a[2],  . . . ,  a[p},  The  number 
of  harmonics  s  is  given  by  [0.5/ /oj ,  where  [ij  denotes  the  largest  integer  less  than  or  equal 
to  X.  The  fact  that  the  number  of  nuisance  parameters  s  -|-  p  -f  1  to  be  estimated  increases 
with  decreasing  /o,  can  sometimes  cause  problems  with  the  ML  approach.  In  a  sense,  the 
model  order  changes  with  a  parameter  (Jo).  A  general  approach  to  the  problem  involves 
model  order  selection  [4].  In  any  event,  the  MLE  of  the  parameters  is  an  integral  part  of 
that  approach.  We  now  discuss  the  MLE,  further  details  are  available  in  [4]. 

The  approximate  MLE  of  the  fundamental  frequency  is  found  by  minimizing  (186),  i.e. 
minimizing 

r{f)  =  [P^  -  PwE  (e"P^E) E"P^] X  ,  (32) 

where  f  =  [fo 2fo  ■■■  sfo]^,  or  by  maximizing 

C(/o)  =  x"P^E  (E"Pi/E)  "*  E^Pjix  .  (33) 

At  low  SNR,  an  interpretation  of  (33)  is  to  whiten  the  data  (via  P^)  followed  by  summing 
the  energies  at  the  harmonics.  This  is  a  prewhitener  /  comb  filter  /  energy  detector.  The 
argument  is  similar  to  that  in  [12,  section  3]. 
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In  the  Ctise  of  a  periodic  signal  in  while  noise,  we  would  have  to  maximize 


C'(/o)  =  xfE(E''E)''E"x,  , 

(34) 

where  E  =  [ei  62  •  •  •  ej,  and  =  1  ej2ir*/o2  . . .  gj2»i/o(;v  i)  data  record  is 

large  enough  so  that  the  harmonic  components  are  orthogonal,  this  becomes 

a/o)«xfE(.VI)-‘E"x„  , 

(35) 

so  that 

i=i 

(36) 

which  is  recognized  as  a  comb  filter  /  energy  detector.  For  the  colored  noise  case,  however, 
we  must  maximize  (33)  over  /o,  a  simple  one  dimensional  search. 


We  now  illustrate  the  procedure  with  a  simple  numerical  example  using  iV  =  100  data 
samples.  A  periodic  signal  with  fundamental  /o  =  0.09  and  amplitudes  Aj  =  0.5,  Ai  = 
1.0,  A3  =  0.5,  A4  =  0.5  (with  s  =  4)  is  considered.  Colored  AR(1)  noise  with  a  pole  at 
0.95  and  <7^  =  1.0  was  added  to  the  signal  to  generate  the  observed  data.  The 

fundamental  frequency  was  estimated  by  minimizing  the  function  given  in  (33)  (the  pro¬ 
posed  method)  and  (36)  (the  usual  comb  filter  +  energy  detector)  using  a  line  search  over 
[—0.125,  0.125).  Note  that  we  must  have  —0.5  <  sfo  <  0.5.  The  signal  amplitudes  are 
then  computed  using  (20a)  and  the  estimated  fundamental  frequencies  to  reconstruct  the 
orignal  signal.  The  real  part  of  the  original  signal  and  the  reconstructed  versions  are  shown 
in  Figure  5.  The  marked  improvement  of  the  proposed  method  over  the  comb  filter  -f  energy 
detector  is  mainly  due  to  accurate  estimation  of  the  fundamental  frequency.  This  happens 
because  a  simple  comb  filter  +  energy  detector  picks  up  the  noise  power  and  estimates  the 
fundamental  to  be  near  the  noise  peak  (or  its  subharmonic),  unlike  the  proposed  method 
which  accounts  for  the  noise  coloration  by  appropriate  prewhitening. 

6  Conclusions 

Many  important  problems  in  statistical  signal  processing  involve  separating  an  observed  time 
series  into  a  deterministic  (signal)  component  and  a  random  (noise)  component.  By  mod¬ 
eling  the  noise  as  an  autoregressive  process,  and  the  signal  as  either  complex  exponentials, 
polynomials,  or  polynomials-tiraes-exponentials,  a  large  class  of  problems  can  be  addressed. 
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The  main  utility  is  due  to  the  enormous  reduction  in  estimator  complexity,  so  that  practical 
solutions  to  a  large  class  of  problems  is  now  possible.  Some  of  these  problems  are  discussed 
in  this  paper.  However,  many  questions  such  as  estimation  of  signal  modes,  model  order 
selection,  etc.  are  yet  to  be  investigated. 
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Appendix  A 

Equivalent  Forms  for  Minimum  Least  Squares  Error 

We  consider  the  generic  least  squares  (LS)  problem  of  minimizing 

=  (x  ~  —  H2S2)^(x  —  —  11202)  ■  (A  ~  1) 

The  solution  is  easily  obtained  by  letting  B  —  ^b\  H  =  [^i  H2I  so  that 

0  =  ,  (A -2) 

and  the  minimum  least  squares  error  is 

J(0„02)  =  x"P^x  ,  (A -3) 

where  =  I  —  H  is  the  orthogonal  complement  projection  operator. 

This  may  also  be  obtained  by  first  projecting  x  onto  the  subspace  spanned  by  the  columns 
of  Hi,  and  then  projecting  the  residual  onto  the  subspace  spanned  by  the  residual  columns 
of  H2. 

In  other  words,  minimizing  J  in  (A-1)  first  with  respect  to  0i  we  get 

«,(»,)  =  (H''H,)-'H''(x-H2»,)  ,  (A -4) 

and  substituting  into  (A-1)  produces 

J-(S,(«,),»,)  =  (Pfx-PfH,92)''(Pfx-Pi-H,«,)  ,  (A -5) 

where  Pf  “  I  —  Ht  Hi)  ^  Next  minimizing  J  in  { A-5)  with  respect  to  0,  we  get 

9,  =  (H''PfH,)-'H''P(-x  ,  (A  -  6) 

and 

J■(9„9l)  =  x''[P^-P^^H,(H''PfH^)■'H''P^]x  ,  (A -7) 

where  the  fact  that  P^  is  idempotent,  i.e.  =  P^  =  P^P^  was  used.  The  overall 

projection  operator  is  P;v  =  Pf  ~  Pii'H2  (h"P^H2)  H^P^-  which  projects  a  vector  onto 

the  orthogonal  complement  of  the  subspace  spanned  by  the  columns  of  [Hi  H2].  Finally, 
using  (A-6)  in  (A-4),  we  get  the  overall  solution  for  0i  as 

0i  =  (HfHi)-‘Hf  (X-H202)  .  (A -8) 

These  expressions.  (A-7)  and  (A-6),  (A-S),  were  used  to  minimize  (17)  in  section  3  to 
obtain  expressions  (18)  and  (19),  (20). 
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Appendix  B 

Alternative  Form  for  MLE  of  Single  Frequency 


Let  u  =  X  -|-  Ha,  denote  the  estimated  driving  noise  sequence,  where  a  is  the  estimate 
of  the  AR  parameters  which  accounts  for  the  presence  of  the  signal.  Also  let  Uq  =  P^x, 
denote  another  estimate  of  the  driving  noise  wherein  the  AR  parameters  are  estimated 
without  accounting  for  the  presence  of  the  signal  (i.e.,  using  usual  covariance  method  of  AR 
parameter  estimation).  Hence,  using  (20<2)  in  (206),  we  obtain 


u  = 


X  +  Ha 

=  X  -  H(H^H)“'H"  X  -  e  (e"P^/e)"'  e^P^x 
P„ee^P^x 


=  X  +  Hlui  + 


=  Uo  + 


Pffe  e^Up 
e^^Pfje 


and  thus 


e"u 


e^Uo  + 


e^P;y^e  e^Uo 


e«P^e 


ij  Li  ^ 

e"  e  e"  Uq 
e^^P^e 


Using  this  In  (21),  we  have 


C(/)  = 


X 


e^P^e 
!  e^Up  p 
e^Pjje 

(e^Uo)* 


e^up 

e^P;ye 


e^u 

e^uu^e 
effe  ■ 
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Appendix  C 

Invariance  Proof 


Let  3i[nj.  52^^])  ■  •  • '  -s.wfnj  he  a  basi?  for  ,5,  a  stibspace  in  C.v-  Then  by  definition  of  the 
invariant  subspace  if  j[n]  given  by 


~  a^s[n],  for  n  =  0.  1 . (A'  -  1)  , 

1=1 

€  5,  then  t/[n]  =  i[n]  +  —  fc]  e  P(S)  if  there  exist  /?,,  ,^2^ - J\f  such  that 


=  i3^s[n],  for  n  ~  p.  p+  I, - (,V  -  1)  . 

■=i 

Thus  by  definition,  for  any  L  £  C  and  x[n]  =  a^s[n]  €  S,  so  that 

y[n]  =  a^s[n]  +  ^  o{^]a^sfn  —  i’j 

k=l 

we  must  have  that  y[n]  £  P{S).  If  we  choose  a.n  L  £  C  such  that  a[l]  =  a[2]  =  •••  = 
a[p  —  1]  =  0,  then 

y[n]  =  a^s(n]  +  a[p|a^s[n  -  p] 


and  y[n]  €  ^(5)  iff 


s(n  -  p]  =  Aps[n] 

for  some  M  x  M  matrix  Ap.  Hence,  we  get 

p-i 

t/[n]  =  cr^  (I  +  a{p] Ap)  s[n]  +  Yi  -  k]  . 

k=\ 


(C-1) 


Now  choosing  each  of  the  remaining  a[t]’s  to  be  zero  except  a[;’|,  with  1  <  j  <  (p  —  I)  (and 
a(p]  ^  0  and  s[n  —  p]  satisfying  (C-l)),  leads  to 

s{n  -  j]  =  Ajs(n]  for  I  <  j  <  (p  -  1)  .  (C  -  2) 

Combining  (C-l)  and  (C-2),  we  have 


s|n  —  _/]  =.  A^s[n]  for  1  <  j  <  p  , 


(C-3) 
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and  for  n  =  p,  p  +  1 ,  . . . ,  (iV  —  1 ). 

But  these  conditions  are  all  satisfied  if  and  only  if 

s[n  —  1]  =  Ais(n)  for  1  <  n  <  (iV  —  1 )  ,  (C  —  4) 

with  Ai  being  nonsingular,  as  we  now  prove.  o 

Note  that  (C-3)  and  (C-4)  are  trivially  equivalent  for  p  =  1  so  that  only  p  >2  are  of  further 
interest.  Also  only  the  p  =  1  case  is  considered  in  [7,  p.  200], 

First,  we  assume  (C-4)  is  true  (if  part).  Hence  condition  (C-3)  for  j  =  I  \s  trivially  true. 
Rewriting  (C-4)  using  m  =  n  -t-  1,  we  get 

s[m  —  2)  =  Ais(m  —  1]  for  2  <  m  <  A 

=  AjSfm)  for  2  <  m  <  (A  —  1)  (C-5) 

Hence  condition  (C-3)  is  true  for  j  =  2.  Similarly,  we  get 

s[n  —  j]  =  Ais[n]  for  p  <  n  <  (A  —  1)  ,  (C  -  6) 

for  j  =  3,  4,  . . . ,  p,  i.e.,  condition  (C-3)  is  true,  with  Aj  =  A^. 

Next,  we  assume  (C-3)  is  true  (only  if  part).  Then  rewriting  (C-3)  for  j  =  1,  we  have 

s[n  —  1]  =  Ais[n]  for  p  <  n  <  (A  —  1)  ,  (C  —  7) 

so  that  one  needs  to  only  prove  that  (C-4)  holds  for  I  <  n  <  (p  —  1).  Considering  (C-3)  for 
j  =  2  and  n  =  p,  we  have 

s[p  -  2]  =  Ajsfp]  . 

But  A2  =  Aj,  as  we  will  prove  henceforth,  so  that 

s[p  -  2)  =  A?s[p]  =  A,  (Ais(p)) 

and  using  (C-3)  for  j  =  I  and  n  =  p,  i.e.,  s[p  —  1]  =  Ais(p],  we  get 

s[p-  2]  =  Ais[p-  1] 

Combining  the  above  result  with  (C-7),  we  get 

s[n  —  1]  =  Ais[n]  for  (p  —  1)  <  n  <  ( A  —  1)  .  (C  —  8) 
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Continuing  in  the  same  vein  using  A_,  =  A-J,  we  obtain  (C-  l)  for  1  <  n  <  (;V  —  1). 

VVe  now  prov'e  that  (C-3)  also  means  Aj  =  Aj,  as  used  above.  We  consider  the  proof  for 
Aj  =  Aj,  the  other  cases  are  similar.  Rewriting  (C-3)  for  j  =  1,  2,  we  have 


s[n  —  i]  =  Ais[nj  for  p  <  n  <  (.V  —  1)  {C-9) 

s[n  —  2]  =  A2s[nj  for  p  <  n  <  (iV  —  I )  (C-10) 

and  letting  m  =  n  -f  1  in  {C-9)  we  have 

s[m  —  2]  =  Aisfm— 1)  for  (p  +  1)  <  m  <  ( A  —  1 ) 

=  A^s[m]  for  (p  -1-  1)  <  m  <  ( A  —  I)  (C-11) 

after  using  (C-9)  again.  Combining  (C-10)  and  (C-11),  we  get 

(Aj  —  A2)  s[n]  =  0  for  (p  H-  1)  <  n  <  (A'  —  1)  .  (C  —  12) 


This  condition  is  satisfied  in  general  when  a  vector  s[n]  lies  in  the  null  space  of  the  M  x  M 
matrix  Aj  —  Aj.  We  however  have  M  linearly  independent  vectors  amongst  s[p],  s[p  -f 
1]  . . . ,  s[N  —  1]  since  the  columns  of  the  matrix 


s^[p] 

^l\p] 

52  b] 

5Mb] 

s^lp-b  1] 

5ib+ 1] 

S2b+  1] 

•••  5Afb+l] 

_s^[Ar-i] 

^s,[N-l] 

S2[N-1] 

•  •  •  sm[N  —  1]  _ 

form  a  basis  for  ■P(5)  (and  are  therefore  linearly  independent).  In  particular,  we  prove  (by 
contradiction)  that  the  last  M  vectors  must  be  linearly  independent,  or  that  the  vectors  in 
Q  =  {s[N  —  A/],  s[N  —  A/  -b  1],  ... ,  s{iV  —  1]}  must  be  linearly  independent.  From  (C-3), 

s[n  —  1]  =  Ais[n]  for  p  <  n  <  (Af  —  1)  , 

and  using  a  backward  recursion,  we  obtain 

s(fV-2]  =  A,s[iV-l] 
s[iV-3]  =  Als[N-l] 

s[N  -  M]  =  A^-‘s[A'  -  1]  . 
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Hence  the  set  of  vectors  become  (2  =  {  A'^''*s(A’ —  ij.  A''^~^s[A'  —  1] . Ais(iV  —  1],  s[.V  — 

1]}.  But,  by  the  Cayley- Hamilton  theorem,  any  matrix  satisfies  its  own  characteristic  equa¬ 
tion  or 

Ar  =  ^a,Ar'  , 

i=j 

so  that  the  vectors  s[A''  — Af  —  1].  s[N  —  M  —  2],  . . . ,  s[p),  must  all  be  linear  combinations  of  the 
vectors  in  Q.  Therefore,  if  the  vectors  in  Q  are  not  linearly  independent,  then  there  does  not 
exist  M  linearly  independent  vectors  amongst  s[p],  s[p+  1],  . . . ,  s[N  —  1|.  This  contradicts 
the  basic  assumption  that  the  sequences  si[nj,  S2[n],  . . . ,  s\i[n]  form  a  basis  for  V{S).  Hence 
the  vectors  in  Q  must  be  linearly  independent.  Finally,  as  long  as  M  <  N  —  p—  1,  for  (C-12) 
to  be  satisfied  with  M  linearly  independent  vectors  s[n|,  the  only  possibility  is  for  the  M  x  A/ 
matrix  Aj  —  A^  to  be  the  all-zero  matrix,  i.e.,  A2  =  A^.  The  proof  for  Aj  =  A{  is  along 
the  same  lines  except  that  there  are  N  —  p  —  j  -l-  1  vectors  satisfying 

^A{  —  Aj)  s[ti|  =  0  for  (p  +  i  -  1)  <  n  <  (A  —  1)  . 

so  that  we  need  M<N  —  p  —  j  +  i  for  all  1  <  j  <  p,  or  that  A/  <  A  —  2p  +  1 . 

We  finally  show  that  the  matrix  Aj  must  be  nonsingular.  Recall  that  s[n]  is  a  basis  for 
V{S),  i.e., 

a^sfn]  =  0  for  p<n<(A  —  1)  a  =  0 

a^s(n  —  1]  =  0  forp-J-l<n<A  <=4-  a  =  0 
and  using  (C-4)  we  have, 

a^Ais[n]  =  0  for  p 1  <  n  <  (A  —  1)  <=»  a=0 

^Afa)^s[n]  =  0  for  p-f  1  <  n  <  (A  -  1)  <=»  a=0 

Since  there  are  M  linearly  independent  vectors  amongst  s[p  -|-  1),  s[p  +  2],  . . . ,  s[A  —  1],  we 
have 

Aj'a  =  0  <.  ‘1-  a  =  0 

i.e.,  Ai  is  nonsingular.  o 

We  now  denote  D  =  A^"'  so  that  (C-4)  can  be  written  as 

s[n]  =  Aj’^sfn  —  1]  =  Ds[n  —  1]  for  1  <  n  <  (A  —  1)  (C  --  13) 

where  D  is  nonsingular.  Solving  the  above  recursion,  we  get 

s{n]  =  D”s[0]  ,  (C  -  14) 
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for  n  =  0,  1.  . . . ,  (iV  —  1).  We  now  proceed  fo  find  all  basis  signals  6i[nj  (that  comprise  s[nj) 
which  satisfy  (C-i4j. 

Since  D  is  arbitrary  (only  nonsingular),  we  consider  its  Jordan  canonical  form,  i.e., 

D  =  QJQ-‘ 


where 


0 

Jr 


and  Q  is  a  A/  X  M  modal  matrix.  The  J^’s  are  m,  x  m,  matrices,  called  the  Jordan  blocks, 
which  are  of  the  form 


1  0  •••  0 

0  A-  I  •••  0 

0  0  0  •••  A. 


and  A,-  is  a  non-zero  (because  D  is  nonsingular)  complex  number,  and  M  =  If  is 

easily  seen  that  D"  =  QJ”Q“‘  so  that  (C-14)  becomes 


s[n]  =  QJ"Q-*s[0]  . 


Using  s'{n]  =  Q”^s[n]  as  an  alternate  basis  for  Cn,  we  can  rewrite  the  above  equation  as 


s'[n]  =  J"s'[0)  , 


and  dropping  the  primes,  we  have 


s(n]  = 


•*1 


T” 

•*2 


0 


Jr" 


s(0]  , 
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and  since  J”  is  block  diagonal,  it  is  enough  to  consider  the  form  of  J".  Denoting  V,  to  be  a 
m,  X  m,  matrix  with  ones  along  the  upper  second  diagonal  and  zero«  elsewhere,  i  e., 

’  0  1  0  •••  o' 

0  0  1  0 

V.=  ;  , 

0  0  0  1 

0  0  0  •••  0 

so  that 

J.  =  A.I  +  V. 

where  I  denotes  the  m;  x  m,  identity  matrix,  and  using  the  binomial  expansion  for  J"  we 
obtain, 
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Partitioning  s[n|  as  [sf[n|  •  •  ■  sj^[n)j  ,  where  s,[n]  is  m,  x  1.  the  solution  b«H:ornes 

s,[n|  =  J"s.[0]  . 

Examining  the  form  of  J"  more  closely,  and  denoting  s,[0]  =  [cqc,  ••  •  we  rewrite 

the  above  equation  as 


s,[n]  = 


jrs.[o] 

A” 


n 

1 


I  A" 


n—  I 


A" 


Co 

Cl 

C2 


Cl 

C2 

C3 


Cm, -2 


Cm,  — 1 

Finally,  the  signals  {A",  ( 


0 


n 

2 

n 

1 


in-2 


in-1 


A? 


C2 

C3 

C4 

0 

0 


Cm,  — 1 
0 
0 

0 

0 


^  i  l^-tm.-l) 


,  m,  -  1  / 


rn,-2)  * 


m.-3j  ' 


A? 


Co 

Cl 

Cm,-1 


n 

i' 

A"-i 

/  n  ^ 

1  .n-(mi-l) 

.  \  "i.  -  1 ; 

(C-16) 


A" 


n-2 


n 


A 


n-{m,-l) 


}  can  be  easily  shown 


”  I  A""*  (  ” 

.1/;  ’V2/  '  '  ’  Vmi-l. 

to  be  linearly  independent.  Since  we  can  a,ssume  without  loss  of  generality  that  ^  0 
{it  only  decides  the  dimensionality  of  the  subspace)  and  hence  the  matrix  is  nonsingular, 
the  elements  of  5i[n]  will  also  be  linearly  independent  and  will  span  the  same  subspace  as 

the  signals  {A”,  ^  *’  ^  ^  ^  which  is  a  m,  dimensional 

subspace  5,-.  The  number  of  these  subspaces  is  arbitrary  as  long  as  =  A/.  The 

entire  subspace  spanned  by  these  signals  (with  different  A,),  i.e.,  S,  is  the  direct  sum  of  the 
oubspaces,  5,,  or  «S  =  5i  ©  52  ®  •  •  •  0  5^.  o 
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Appendix  D 

General  Parameter  Transformation 

To  determine  the  general  parameter  transformation,  consider  the  signal 

A?-' . 

t=;l  j=0  \J  / 

Since  each  subspace  Si  is  invariant,  we  need  to  consider  only 


\J  / 


(D-l) 


(D-2) 


Let  B,  =  Bq^  ,  and  since  the  first  row  of  J"  contains  the  basis  signals,  we 

obtain 

s,[n]  =  Bfs,[n] 


"U-' 


BT  Tnr_ 


where  ei  =  [1  00  •  •  ■  0]^.  But, 


m,-l)  ’ 


=  a{A:]s,[n  -  k] 

k=0 

=  E-WBfjr*’''! 


=  Dfs,|nl 
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where 


for  z  =  1,2,  . .  . ,  r. 


D.=  £«[A-]J 


*.=0 
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Figure  4c:  AR  Spectral  Estimate  (modified  covariance  method) 
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Figure  4d:  Average  Spectral  Estimates  using  50  realizations 


